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1 Introduction 

Taken literally, galaxy clusters must be comprised of an overdensity of galax- 
ies. Almost as soon as the debate was settled on whether or not the "nebulae" 
were extragalactic systems, it became clear that their distribution was not ran- 
, dom, with regions of very high over- and under-density. Thus, from a historical 

' perspective, it is important to discuss the detection of galaxy clusters through 

their galactic components. Today we recognize that galaxies constitute a very 
small fraction of the total mass of a cluster, but they are nevertheless some 
^> of the clearest signposts for detection of these massive systems. Furthermore, 

the extensive evidence for differential evolution between galaxies in clusters 
and the field (discussed at length elsewhere in these proceedings) means that 
' it is imperative to quantify the galactic content of clusters. 

Perhaps even more importantly, optical detection of galaxy clusters is now 



O 
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O ' inexpensive both financially and observationally. Large arrays of CCD detec- 

tors on moderate telescopes can be utilized to perform all-sky surveys with 
which we can detect clusters to z ~ 0.5. Using some of the efficient tech- 
niques discussed later in this section, we can now survey hundreds of square 
degrees for rich clusters at redshifts of order unity with 4-meter class tele- 
scopes, and similar surveys, over smaller areas but with larger telescopes are 
in ' finding group-mass systems to similar distances. 

Looking to the future, ever larger and deeper surveys will permit the 
characterization of the cluster population to lower masses and higher red- 
shifts. Projects such as the Large Synoptic Survey Telescope (LSST) will map 
thousands of square degrees to very faint limits (29th magnitude per square 
arcsecond) in at least five filters, allowing the detection of clusters through 
their weak lensing signal (i.e. mass) as well as the visible galaxies. Ever more 
efficient cluster-finding algorithms are also being developed, in an effort to 
produce catalogs with low contamination by line-of-sight projections, high 
completeness, and well-understood selection functions. 
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This chapter provides an overview of past and present techniques for op- 
tical detection of galaxy clusters. It follows the progression of cluster detec- 
tion techniques through time, allowing readers to understand the development 
of the field while explaining the variety of data and methodologies applied. 
Within each section we describe the datasets and algorithms used, pointing 
out their strengths and important limitations, especially with respect to the 
characterizability of the resulting catalogs. The next section provides a his- 
torical overview of pre-digital, photographic surveys that formed the basis for 
most cluster studies until the start of the twenty-first century. Section three 
describes the hybrid photo-digital surveys that created the largest current 
cluster catalogs. The fourth section is devoted to fully digital surveys, most 
specifically the Sloan Digital Sky Survey and the variety of methods used for 
cluster detection. We also describe smaller surveys, mostly for higher rcdshift 
systems. The fifth section gives an overview of the different algorithms used 
by these surveys, with an eye towards future improvements. The concluding 
section discusses various tests that remain to be done to fully understand any 
of the catalogs produced by these surveys, so that they can be compared to 
simulations. 



2 Photographic Cluster Catalogs 

Even before astronomers had a full grasp of the distances to other galax- 
ies, the creators of the earliest catalogs of nebulae recognized that they were 
sometimes in spectacular groups. Messier and the Herschels observed the com- 
panions of Andromeda and what we today know as the Pisces- Perseus super- 
cluster. With the invention of the wide-field Schmidt telescope, astronomers 
undertook imaging surveys covering significant portions of the sky. These 
quickly revealed some of the most famous clusters, including Virgo, Coma, 
and Hydra. The earliest surveys relied on visual inspection of vast numbers 
of photographic plates, usually by a single astronomer. As early as 1938, 
Zwicky discussed such a survey based on plates from the 18" Schmidt tele- 
scope at Palomar. In 1942, Zwicky and Katz & Mulders published a pair of 
papers presenting the first algorithmic analyses of galaxy clustering from the 
Shapley-Ames catalog, using galaxies brighter than 12.7 m . Examining counts 
in cells, cluster morphologies, and clustering by galaxy type, these surveys laid 
the foundation for decades of galaxy cluster studies, but were severely limited 
by the very bright magnitude limit of the source material. Nevertheless, many 
fundamental properties of galaxy clusters were discovered. Zwicky, with his 
typical prescience, noted that elliptical galaxies are much more strongly clus- 
tered than late- type galaxies (Figure 1), and attempted to use the structure 
and velocity dispersions of clusters to constrain the age of the universe as well 
as galaxy masses. 

However, the true pioneering work in this field did not come until 1957, 
upon the publication of a catalog of galaxy clusters produced by George Abell 
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Fig. 1. The radial distribution of elliptical and spiral "nebulae" in the Virgo cluster. 
The enhanced clustering of elliptical galaxies is apparent, and is used to construct 
many modern cluster catalogs. 

as his Caltech Ph.D. thesis, which appeared in the literature the following year 
(Abcll 1958). Zwicky followed suit a decade later, with his voluminous Cata- 
logue of Galaxies and of Clusters of Galaxies (Zwicky, Herzog & Wild 1968). 
However, Abell's catalog remained the most cited and utilized resource for 
both galaxy population and cosmological studies with clusters for over forty 
years. Abell used the red plates of the first National Geographic-Palomar Ob- 
servatory Sky Survey. These plates, each spanning ~ 6° on a side, covered the 
entire Northern sky, to a magnitude limit of m r ~ 20. His extraordinary work 
required the visual measurement and cataloging of hundreds of thousands of 
galaxies To select clusters, Abell applied a number of criteria in an attempt 
to produce a fairly homogeneous catalog. He required a minimum number 
of galaxies within two magnitudes of the third brightest galaxy in a cluster 
(m3 + 2), a fixed physical size within which galaxies were to be counted, a max- 
imum and minimum distance to the clusters, and a minimum galactic latitude 
to avoid obscuration by interstellar dust. The resulting catalog, consisting of 
1,682 clusters in the statistical sample, remained the only such resource until 
1989. In that year, Abell, Corwin & Olowin (hereafter ACO) published an 
improved and expanded catalog, now including the Southern sky. These cata- 
logs have been the foundation for many cosmological studies over the last four 
decades, even with serious questions about their reliability. Despite the nu- 
merical criteria laid out to define clusters in the Abell and ACO catalogs, their 
reliance on the human eye and use of older technology and a single filter led 
to various biases. These include a bias towards centrally-concentrated clusters 
(especially those with cD galaxies), a relatively low redshift cutoff (z ~ 0.15; 
Bahcall & Soneira 1983), and strong plate-to-plate sensitivity variations. Pho- 
tometric errors and other inhomogencitics in the Abcll catalog (Sutherland 
1988, Efstathiou et al. 1992), as well as projection effects (Lucey 1983, Kat- 
gert et al. 1996) are a serious and difficult-to-quantify issue. These resulted 
in early findings of excess large-scale power in the angular correlation func- 
tion (Bahcall & Soneira 1983), and later attempts to disentangle these issues 
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relied on models to decontaminate the catalog (Sutherland 1988, Olivier et 
al. 1990). The extent of these effects is also surprisingly unknown; measures 
of completeness and contamination in the Abell catalog disagree by factors of 
a few. For instance, Miller, Batuski & Slinglend (1999) claim that under- or 
over-estimation of richness is not a significant problem, whereas van Haarlem, 
Frenk & White (1997) suggest that one-third of Abell clusters have incorrect 
richnesses, and that one-third of rich (R > 1) clusters are missed. Unfor- 
tunately, some of these problems will plague any optically selected cluster 
sample, but objective selection criteria and a strong statistical understanding 
of the catalog can mitigate their effects. 

In addition to the Zwicky and Abell catalogs, a few others based on plate 
material have also been produced, such as Shectman (1985), from the galaxy 
counts of Shane & Wirtanen (1954), and a search for more distant clusters 
carried out on plates from the Palomar 200" by Gunn, Hoessel & Oke (1986; 
hereafter GHO). None of these achieved the level of popularity of the Abell 
catalog, although the GHO survey was one of the first to detect a significant 
number of clusters at moderate to high redshifts (0.15 < z < 0.9), and remains 
in use to this day. 

3 Hybrid Photo-Digital Surveys 

Only in the past ten years has it become possible to utilize the objectivity 
of computational algorithms in the search for galaxy clusters. These more 
modern studies required that plates be digitized, so that the data arc in 
machine readable form. Alternatively, the data had to be digital in origin, 
coming from CCD cameras. Unfortunately, this latter option provided only 
small area coverage, so the hybrid technology of digitized plate surveys blos- 
somed into a cottage industry, with numerous catalogs being produced in 
the past decade. All such catalogs relied on two fundamental data sets: the 
Southern Sky Survey plates, scanned with the Automatic Plate Measuring 
(APM) machine (Maddox et al. 1990) or COSMOS scanner (to produce the 
Edinburgh/Durham Southern Galaxy Catalog / EDSGC, Heydon-Dumblcton, 
Collins & MacGillivray 1989), and the POSS-I, scanned by the APS group 
(Pennington et al. 1993). The first objective catalog produced was the Edin- 
burgh/Durham Cluster Catalog (EDCC, Lumsdcn et al. 1992), which covered 
0.5 sr (~ 1, 600 square degrees) around the South Galactic Pole (SGP). Later, 
the APM cluster catalog was created by applying Abell-like criteria to select 
overdensities from the galaxy catalogs, and is discussed in detail in Dalton et 
al. (1997). More recent surveys, such as the EDCCII (Bramel, Nichol & Pope 
2000) did not achieve the large area coverage of DPOSS (see below), and per- 
haps more importantly, are not nearly as deep. For instance, the EDCCIFs 
limiting magnitude is bj = 20.5. For an L* elliptical this corresponds to a 
limiting rcdshift of z ~ 0.23. The work by Odewahn & Aldcring (1995), based 
on the POSS-I, provided a Northern sky example of such a catalog, while uti- 
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lizing additional information (namely galaxy morphology). Some initial work 
on this problem, using higher quality POSS-II data, was performed by Picard 
(1991) in his thesis. 

The largest, most recent, and likely the last photo-digital cluster survey 
is the Northern Sky Optical Survey (NoSOCS; Gal et al. 2000, 2003, 2006; 
Lopes et al. 2004). This survey relies on galaxy catalogs created from scans 
of the second generation Palomar Sky Survey plates. The POSS-II (Reid et 
al. 1991) covers the entire northern sky (<5 > —3°) with 897 overlapping fields 
(each 6.5° square, with 5° spacings), and, unlike the old POSS-I, has no gaps in 
the coverage. Approximately half of the survey area is covered at least twice 
in each band, due to plate overlaps. Plates are taken in three bands: blue- 
green, Illa-J + GG395, A off - 480nm; red, IIIa-F + RG610, A cff - 650nm; 
and very near-IR, TV-N + RG9, A e fi ~ 850nm. Typical limiting magnitudes 
reached are Bj ~ 22.5, R F ~ 20.8, and I N ~ 19.5, i.e. , - l m - 1.5™ 
deeper than POSS-I. The image quality is improved relative to POSS-I, and 
is comparable to the southern photographic sky surveys. The original survey 
plates are digitized at STScI, using modified PDS scanners (Lasker 1996). The 
plates are scanned with 15/j, (1.0") pixels, in rasters of 23,040 square, giving 
~ 1 GB/platc, or ~ 3 TB of pixel data total for the entire digital survey. 
The digital scans are processed, calibrated, and cataloged, with detection of 
all objects down to the survey limit, and star/galaxy classifications accurate 
to 90% or better down to ~ l m above the detection limit (Odewahn et al. 
2004) . They are photometrically calibrated using extensive CCD observations 
of Abcll clusters (Gal et al. 2004a). 

The resulting galaxy catalogs are used as an input to an adaptive kernel 
galaxy density mapping routine (discussed in §5), and photometric redshifts 
based on galaxy colors are calculated, along with cluster richnesses in a fixed 
absolute luminosity interval. The NoSOCS survey utilizes F (red) plates, with 
a limiting magnitude of m r — 20. This corresponds to a limiting redshift of 
0.33 for an L* elliptical galaxy. Because of the increase in g — r color with 
redshift, the APM would have to go as deep as bj = 22.0 to reach the same red- 
shift from their data for early type galaxies. Similarly, even at lower redshift, 
this implies that DPOSS can see ~ 0.5 m — 1 TO deeper in the cluster lumi- 
nosity functions. Additionally, NoSOCS uses at least one color (two filters), 
and a significantly increased amount of CCD photometric calibration data. 
The final catalog covers 11,733 square degrees, with nearly 16,000 candidate 
clusters (Figure 2), extending to z ~ 0.3, making it the largest such resource 
in existence. However, new CCD surveys, discussed in the next section, are 
about to surpass even this benchmark. 

4 Digital CCD Surveys 

With the advent of charge- coupled devices (CCDs), fully digital imaging in 
astronomy became a reality. These detectors provided an ordcr-of-magnitude 
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Fig. 2. The sky distribution of NoSOCS (northern sky) and APM (southern sky) 
candidate clusters in equatorial coordinates. The much higher density of NoSOCS 
is due to its deeper photometry and lower richness limit. 

increase in sensitivity, linear response to light, small pixel size, stability, and 
much easier calibration. The main drawback relative to photographic plates 
was (and remains) their small physical size, which permits only a small area 
(of order 10') to be imaged by a typical 2048 2 pixel detector. As detector sizes 
grew, and it became possible to build multi-detector arrays covering large 
areas, it became apparent that new sky surveys with this modern technology 
could be created, far surpassing their photographic precursors. Unfortunately, 
in the 1990s most modern telescopes did not provide large enough fields- 
of-view, and building a sufficiently large detector array to efficiently map 
thousands of square degrees was still challenging. 

Nevertheless, realizing the vast scientific potential of such a survey, an in- 
ternational collaboration embarked on the Sloan Digital Sky Survey (SDSS, 
York et al. 2000), which included construction of a specialized 2.5 meter tele- 
scope, a camera with a mosaic of 30 CCDs, a 640-fiber multi-object spectro- 
graph, a novel observing strategy, and automated pipelines for survey oper- 
ations and data processing. Main survey operations were completed in the 
fall of 2005, with over 8,000 square degrees of the northern sky image in five 
filters to a depth of r' ~ 22.2 with calibration accurate to ~ 2 — 3%, as well 
as spectroscopy of nearly one million objects. 

With such a rich dataset, many groups both internal and external to the 
SDSS collaboration have generated a variety of cluster catalogs, from both 
the photometric and the spectroscopic catalogs, using techniques including: 

1. Voronoi Tessellation (Kim et al. 2002) 

2. Overdensities in both spatial and color space (maxBCG, Annis et al. 1999) 
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3. Subdividing by color and making density maps (Cut-and-Enhance, Goto 
et al. 2002) 

4. The Matched Filter and its variants (Kim et al. 2002) 

5. Surface brightness enhancements (Bartelmann et al. 2002, Zaritsky et al. 
1997, 2002) 

6. Overdensities in position and color spaces, including rcdshifts (C4; Miller 
et al. 2005) 

These techniques are described in more detail in §5. Each method gener- 
ates a different catalog, and early attempts to compare them have shown not 
only that the catalogs are quite distinct, but also that comparison of two 
photometrically-derived catalogs, even from the same galaxy catalogs, is not 
straightforward (Bahcall et al. 2003). 

In addition to the SDSS, smaller areas, to much higher redshift, have been 
covered by numerous deep CCD imaging surveys. Notable examples include 
the Palomar Distant Cluster Survey (PDCS, Postman et al. 1996), the ESO 
Imaging Survey (EIS, Lobo et al. 2000), Zaritsky et al. (1997), and many 
others. None of these surveys provide the angular coverage necessary for large- 
scale structure and cosmology studies, and are specifically designed to find rich 
clusters at high redshift. The largest such survey to date is the Red Sequence 
Cluster Survey (RCS, Gladdcrs et al. 2005), based on moderately deep two- 
band imaging using the CFH12K mosaic camera on the CFHT 3.6m telescope, 
covers ~ 100 square degrees. This area coverage makes it comparable to or 
larger than X-ray surveys designed to detect clusters at z ~ 1. The use of the 
red sequence of early-type galaxies makes this a very efficient survey, and the 
methodology is described is §5. 

5 Algorithms 

From our earlier discussion, it is obvious that many different mathematical and 
methodological choices must be made when embarking on an optical cluster 
survey. Regardless of the dataset and algorithms used, a few simple rules 
should be followed to produce a catalog that is useful for statistical studies of 
galaxy populations and for cosmological tests: 

1. Cluster detection should be performed by an objective, automated algo- 
rithm to minimize human biases and fatigue. 

2. The algorithm utilized should impose minimal constraints on the physical 
properties of the clusters, to avoid selection biases. If not, these biases 
must be properly characterized. 

3. The sample selection function must be well-understood, in terms of both 
completeness and contamination, as a function of both redshift and rich- 
ness. The effects of varying the cluster model on the determination of 
these functions must also be known. 
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4. The catalog should provide basic physical properties for all the detected 
clusters, including estimates of their distances and some mass-related 
quantity (richness, luminosity, overdensity) such that specific subsamplcs 
can be selected for future study. 

This section describes many of the algorithms used to detect clusters in 
modern cluster surveys. No single one of these generates an 'optimal' cluster 
catalog, if such a thing can even be said to exist. Therefore, I provide some of 
the strengths and weaknesses of each technique. In addition to the methods 
discussed here, many other variants are possible, and in the future, joint de- 
tection at multiple wavelengths (i.e. optical and X-ray, Schuecker et al. 2004) 
may yield more complete samples to higher redshifts and lower mass limits, 
with less contamination. 



5.1 Counts in Cells 

The earliest cluster catalogs, like those of Abell, utilized a simple technique 
of counting galaxies in a fixed magnitude interval, in cells of a fixed physical 
or angular size. Indeed, Abell simply used visual recognition of galaxy over- 
densities, whose properties were then measured ex post facto in fixed physical 
cells. This technique was used by Couch et al. (1991) and Lidman & Peterson 
(1996) to detect clusters at moderate redshifts (z ~ 0.5), by requiring a spec- 
ified enhancement, above the mean background, of the galaxy surface density 
in a given area. This enhancement, called the contrast, is defined as 

N cluster N field /-. \ 

Ocl = (1) 

C field 

where N c i uster is the number of galaxies in the cell corresponding to the clus- 
ter, N fi e id is the mean background counts and o-f ie id is the variance of the 
field counts for the same area. The magnitude range and cell size used are 
parameters that must be set based on the photometric survey material and 
the type or distance of clusters to be found. For instance, Lidman & Peterson 
(1996) chose these parameters to maximize the contrast above background for 
a cluster at z — 0.5. Using the distribution of cell counts, one can analyti- 
cally determine the detection likelihood of a cluster with a given redshift and 
richness (assuming a fixed luminosity function), given a detection threshold. 
The false detection rate is harder, if not impossible, to quantify, without run- 
ning the algorithm on a catalog with extensive spectroscopy. This is true for 
most of the techniques that rely on photometry alone. It is also possible to 
increase the contrast of clusters with the background by weighting galaxies 
based on their luminosities and positions. Galaxies closer to the cluster center 
are upweighted, while the luminosity weighting depends on both the cluster 
and field luminosity functions, as well as the cluster redshift. This scheme is 
similar to that used by the matched filter algorithm, detailed later. 
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This technique, although straightforward, has numerous drawbacks. First, 
it relies on initial visual detection of overdensities, which are then quantified 
objectively. Since simple counts-in-cells methods use the galaxy distribution 
projected along the entire line of sight, chance alignments of poorer systems 
become more common, increasing the contamination. Optimizing the magni- 
tude range and cell size for a given redshift reduces the efficiency of detecting 
clusters at other redshifts, especially closer ones since their core radii are much 
larger. Setting the magnitude range typically assumes that the cluster galaxy 
luminosity function at the redshift of interest is the same as it is today, which 
is not true. Furthermore, single band surveys observe different portions of 
the rest frame spectrum of galaxies at different redshifts, altering the relative 
sensitivity to clusters over the range probed. Finally, the selection function 
can only be determined analytically for circular clusters with fixed luminosity 
functions. Given these issues, this technique is inappropriate for modern, deep 
surveys. 

5.2 Percolation Algorithms 

A majority of current cluster surveys rely on a smoothed map of projected 
galaxy density from which peaks are selected (see below). However, smoothing 
invariable reduces the amount of information being used, leading some authors 
to employ percolation (or friends-of-friends, FOF) algorithms. In their sim- 
plest form, these techniques link pairs of galaxies that are separated by a 
distance less than some threshold (typically related to the mean galaxy sep- 
arations). Galaxies that have links in common are then assigned to the same 
group; once a group contains more than a specified number of members, it be- 
comes a candidate cluster. This technique was used by Dalton et al. (1997) to 
construct a cluster catalog from APM data. However, it is not typically used 
on two dimensional data, because the results of this method are very sensitive 
to the linking length, and can easily combine multiple clusters into long, fila- 
mentary structures. On the other hand, FOF algorithms are very commonly 
used for structure finding in three-dimensional data, especially N-body simu- 
lations (Davis et al. 1985, Efstathiou et al. 1988) and redshift surveys (Huchra 
& Geller 1982, Ramclla et al. 2002). A variant of this technique utilizing pho- 
tometric redshifts has been recently proposed by Botzler et al. (2004). 

5.3 Simple Smoothing Kernels 

Another objective and automated approach to cluster detection is the use of 
a smoothing kernel to generate a continuous density field from the discrete 
positions of galaxies in a catalog. For instance, Shectman (1985) used the 
galaxy counts of Shane & Wirtanen in 10' bins, smoothed with a very simple 
weighting kernel. A minimum number of galaxies within this smoothed region 
(in this case, 20) were then required to detect a cluster. This type of kernel is 
fixed in angular size and thus does not smooth clusters at different redshifts 
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with consistent physical radii, making its sensitivity highly redshift dependent. 
Similarly, it uses the full projected galaxy distribution (much as Abell did), 
and is thus insensitive to the different parts of the LF sampled at different 
redshifts. 

5.4 The Adaptive Kernel 

A slightly more sophisticated technique is to use an adaptive smoothing ker- 
nel (Silverman 1986). This technique uses a two-stage process to produce a 
density map. First, at each point t, it produces a pilot estimate f(t) of the 
galaxy density at each point in the map. Based on this pilot estimate, it then 
applies a smoothing kernel whose size changes as a function of the local den- 
sity, with a smaller kernel at higher density. This is achieved by defining a 
local bandwidth factor: 

A, = [f(t)/g]- a , (2) 

where g is the geometric mean of f(t) and a is a sensitivity parameter that sets 
the variation of kernel size with density. NoSOCS uses a sensitivity parameter 
a = 0.5, which results in a minimally biased final density estimate, and is 
simultaneously more sensitive to local density variations than a fixed-width 
kernel (Silverman 1986). This is then used to construct the adaptive kernel 
estimate: 

n 

fit) = n- 1 h- 2 K 2 K{h- 2 \r\t X t )} (3) 
»=i 

where h is the bandwidth, which is a parameter that must be set based on 
the survey properties. 

The adaptive kernel was used to generate the Northern Sky Optical Cluster 
Survey (Gal et al. 2000, 2003, 2006). The smoothing size (in their case, 500" 
radius) is set to prevent over-smoothing the cores of higher redshift (z ~ 0.3) 
clusters, while avoiding fragmentation of most low redshift (z ~ 0.08) clusters. 
Because the input galaxy catalog is relatively shallow, and the redshift range 
probed is not very large, it is possible to do this. For deeper surveys, this is 
not practical, and therefore this technique cannot be used in its simplest form. 
Figure 3 demonstrates example density maps, showing the effect of varying 
the initial smoothing window. In this figure, four simulated clusters are placed 
into a simulated background, representing the expected range of detectability 
in the NoSOCS survey. There are two clusters at low z (0.08), and two at 
high z (0.24), with one poor and one rich cluster at each redshift (100 and 
333 total members, N ga i s = 25 and 80 respectively). 

After a smooth density map is generated, cluster detection can be per- 
formed analogously to object detection in standard astronomical images. In 
NoSOCS, Gal et al. used SExtractor (Bertin & Arnouts 1996) to detect den- 
sity peaks. The tuning of parameters in the detection step is fundamentally 
important in such surveys, and can be accomplished using simulated clusters 
placed in the observed density field, from which the completeness and false 
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Fig. 3. The effect of varying the initial smoothing window for the adaptive kernel on 
cluster appearance. Each panel contains a simulated background with four simulated 
clusters, as described in the text. The smoothing kernel ranges in size from 300" to 
800" in 100" increments. Taken from Gal et al. (2003). 

detection rates can be determined. Even so, this method involves many ad- 
justable parameters (the smoothing kernel size, sensitivity parameter, and all 
the source detection parameters) such that it must be optimized with care 
for the data being used. Given an end-to-end cluster detection methodology, 
one can use simulations to determine the selection function's dependence on 
rcdshift, richness, and other cluster properties (see Gal et al. 2003 for details). 
However, the measurement of cluster richness and redshift are done in a step 
separate from detection, using the input galaxy catalogs, further complicat- 
ing this technique. The adaptive kernel is very fast and simple to implement, 
making it suitable for all-sky surveys, but is only truly useful in situations 
where the photometry is poor, and the survey is not very deep, as is the case 
for NoSOCS. 

5.5 Surface Brightness Enhancements 

It is not necessary to have photometry for individual galaxies to detect clus- 
ters. A novel but difficult approach is to detect the localized cumulative sur- 
face brightness enhancement due to unresolved light from galaxies in distant 
clusters. This method was pioneered by Zaritsky et al. (1997, 2001), who 
showed that distant clusters could be detected using short integration times 
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on small 1-m class telescopes. However, this method requires extremely ac- 
curate flat-fielding, object subtraction, masking of bright stars, and excellent 
data homogeneiety. Once all detected obejcts are removed from a frame, and 
nuisance sources such as bright stars masked, the remaining data is smoothed 
with a kernel comparable to the size of clusters at the desired redshift. The 
completeness and contamination rates of such a catalog are extremely difficult 
to model. Thus, this technique is not necessarily appropriate for generating 
statistical catalogs for cosmological tests, but is an excellent, cost-effective 
means to find interesting objects for other studies. 

5.6 The Matched Filter 

With accurate photometry, and deeper surveys, one can use more sophisti- 
cated tools for cluster detection. As we will discuss later, color information 
is very powerful, but is not always available. However, even with single-band 
data, it is possible to simultaneously use the locations and magnitudes of 
galaxies. One such method is the matched filter (Postman et al. 1996), which 
models the spatial and luminosity distribution of galaxies in a cluster, and 
tests how well galaxies in a given sky region match this model for various red- 
shifts. As a result, it outputs an estimate of the redshift and total luminosity 
of each detected cluster as an integral part of the detection scheme. Following 
Postman et al. we can describe, at any location, the distribution of galaxies 
per unit area and magnitude D(r, to) as a sum of the background and possible 
cluster contributions: 

D(r, to) = b(m) + A cl P(r /r c )(p(m - m*) (4) 

Here, D is the number of galaxies per magnitude per arcsec 2 at magnitude 
m and distance r from a putative cluster center. The background density is 
b(m), and the cluster contribution is defined by a parameter A c i proportional 
to its total richness, its differential luminosity function (f>(m — to*), and its 
projected radial profile P(r/r c ). The parameter r c is the characteristic cluster 
radius, and to* is the characteristic galaxy luminosity. One can then construct 
a likelihood for the data given this model, which is a function of the parameters 
r c ,m* , and A c i. Because two of these parameters, especially to*, are sensitive to 
the redshift, one obtains an estimated redshift when maximizing the likelihood 
relative to this parameter. The algorithm outputs the richness A c i at each 
redshift tested, and thus provides an integrated estimator of the total cluster 
richness. The luminosity function used by Postman et al. is a Schechter (1976) 
function with a power law cutoff applied to the faint end, while they use a 
circularly symmetric radial profile with core and cutoff radii (see their eqn. 
19). 

Like the adaptive kernel, this method produces density maps on which 
source detection must still be run. These maps have a grid size set by the user, 
typically of order half the core radius at each redshift used, with numerous 
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maps for each field, one for each redshift tested. The goal of the matched filter 
is to improve the contrast of clusters above the background, by convolving 
with an 'optimal' filter, and also to output redshift and richness estimates. 
Given a set of density maps, one can use a variety of detection algorithms 
to select peaks. A given cluster is likely to be detected in multiple maps (at 
different redshifts) of the same region; its redshift is estimated by finding 
the filter redshift at which the peak signal is maximized. By using multiple 
photometric bands, one can run this algorithm separately on each band and 
improve the reliability of the catalogs. The richness of a cluster is measured 
from the density map corresponding to the cluster redshift, and represents 
approximately the equivalent number of L* galaxies in the cluster. 

The matched filter is a very powerful cluster detection technique. It can 
handle deep surveys spanning a large redshift range, and provides redshift and 
richness measures as an innate part of the procedure. The selection function 
can be estimated using simulated clusters, as was done in significant detail 
by Postman et al. However, the technique relies on fixed analytic luminosity 
functions and radial profiles for the likelihood estimates. Thus, clusters which 
have properties inconsistent with these input functions will be detected at 
lower likelihood, if at all. While this is not likely to be an issue at low to 
moderate redshifts, as the population of clusters becomes increasingly merger 
dominated at z <~ 0.8 (Cohn et al. 2005), these simple representations will 
fail. Similarly, the cluster and field LF both evolve with redshift, which can 
effect the estimated redshift. Also, as the redshifts and ^-corrections become 
large, one samples a very different region of the LF than at low redshift. 
Nevertheless, this remains one of the best cluster detection techniques for 
cluster detection in moderately deep surveys. 

5.7 Hybrid and Adaptive Matched Filter 

The matched filter can be extended to include estimated (photometric) 
or measured (spectroscopic) redshifts. This extension has been called the 
adaptive matched filter (AMF, Kepner et al. 1999). The adaptive here 
refers to this method's ability to accept 2-dimensional (positions and magni- 
tudes), 2.5-dimensional (positions, magnitudes, and estimated redshifts), and 
3-dimensional (positions, magnitudes, and redshifts) data, adapting to the 
redshift errors. In implementation, this technique uses a two-stage method, 
first maximizing the cluster likelihood on a coarse grid of locations and red- 
shifts, and then refining the redshift and richness on a finer grid. Unlike the 
standard matched filter, the AMF evaluates the likelihood function at each 
galaxy position, and not on a fixed grid for each redshift interval. Thus, for 
each galaxy, the output includes a likelihood that there is a cluster centered 
on this galaxy, and the estimated redshift. 

The inclusion of photometric redshifts should substantially improve detec- 
tion of poor clusters, which is very important since most galaxies live in poor 
systems, and these are suspected to be sites for significant galaxy evolution. 
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However, Kim et al. (2002), using SDSS data, found that the simple matched 
filter is more efficient at detecting faint clusters, while the AMF estimated 
cluster properties more accurately. The matched filter performs better for de- 
tection because the significance threshold for finding candidates is redshift 
dependent, determined separately for each map produced in different redshift 
intervals. The AMF, on the other hand, finds peaks first in redshift space, 
and then selects candidates using a universal threshold. Thus, they propose a 
hybrid system, using the matched filter to detect candidate clusters, and the 
AMF to obtain its properties. 

5.8 Cut-and-Enhance 




Fig. 4. An enhanced map of the galaxy distribution in the SDSS Early Data Release, 
after applying the g * — r * —i* color-color cut. Detected clusters are circled. Taken 
from Goto et al. (2002). 

Despite the popularity of matched filter algorithms for cluster detection, 
their assumption of a radial profile and luminosity function are cause for 
concern. Thus, development of semi-parametric detection methods remains a 
vibrant area of research. While the adaptive kernel described earlier is such 
a technique, more sophisticated algorithms are possible, especially with the 
inclusion of color information. One such technique is the Cut-and-Enhance 
method (Goto et al. 2002), which has been applied to SDSS data. This method 
relies on the presence of the red sequence in clusters, applying a variety of 
color and color-color cuts to generate galaxy subsamples which should span 
different redshift ranges. Within each cut, pairs of galaxies with separations 
less than 5' are replaced by Gaussian clouds, which are then summed to gen- 
erate density maps. In this technique, the presence of many close pairs (as in a 
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high redshift cluster) yields a more compact cloud, making it easier to detect, 
and thus possibly biasing the catalog against low-z clusters. As with the AK 
technique, this method yields a density map on which object detection must 
be performed; Goto et al. (2002) utilize SExtractor. Once potential clusters 
are detected in the maps made using the various color cuts, these catalogs 
must be merged to produce a single list of candidates. Redshift and richness 
estimates are performed a posteriori, as they are with the AK. Similar to the 
AK, there are many tunable parameters which make this method difficult to 
optimize. 

5.9 Voronoi Tessellation 




5000 jo* 1.5xl0 4 2xl0 4 

X (pixels) 



Fig. 5. Voronoi Tessellation of galaxies with 17.0 < m r < 18.5 in a DPOSS field. 
Each triangle represents a galaxy surrounded by its associated Voronoi cell (indicated 
by the polyhedrals) . Excised areas (due to bright objects) are shown as rectangles. 
Taken from Lopes et al. (2004) 

Considering a distribution of particles it is possible to define a charac- 
teristic volume associated with each particle. This is known as the Voronoi 
volume, whose radius is of the order of the mean particle separation. The 
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complete division of a region into these volumes is known as Voronoi Tessel- 
lation (VT), and it has been applied to a variety of astronomical problems, 
and in particular to cluster detection by Kim et al. (2002) and Ramella et al. 
(2001). As pointed out by the latter, one of the main advantages of employing 
VT to look for galaxy clusters is that this technique does not distribute the 
data in bins, nor does it assume a particular source geometry intrinsic to the 
detection process. The algorithm is thus sensitive to irregular and elongated 
structures. 




The parameter of interest in this case is the galaxy density. When applying 
VT to a galaxy catalog, each galaxy is considered as a seed and has a Voronoi 
cell associated to it. The area of this cell is interpreted as the effective area a 
galaxy occupies in the plane. The inverse of this area gives the local density at 
that point. Galaxy clusters are identified by high density regions, composed 
of small adjacent cells, i.e. , cells small enough to give a density value higher 
than the chosen density threshold. An example of Voronoi Tessellation applied 
to a galaxy catalog for one DPOSS field is presented in Figure 5. For clarity, 
we show only galaxies with 17.0 < m r < 18.5. 

Once such a tessellation is created, candidate clusters are identified based 
on two criteria. The first is the density threshold, which is used to identify 
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Fig. 7. Richness (top) and estimated redshift (bottom) distributions for clusters 
detected in DPOSS by only the VT code (dash-dotted line), only the AK code 
(dotted line), and by both methods (heavy solid line). Taken from Lopes et al. 
(2004). 

fluctuations as significant overdensities over the background distribution, and 
is termed the search confidence level (scl). The second criterion rejects can- 
didates from the preliminary list using statistics of Voronoi Tessellation for 
a poissonian distribution of particles, by computing the probability that an 
overdensity is a random fluctuation. This is called the rejection confidence 
level (rcl). Kim et al. (2002) used the color-magnitude relation for cluster 
ellipticals to divide the galaxy catalog into separate redshift bins, and ran the 
VT code on each bin. Candidates in each slice are identified by requiring a 
minimum number Nhdg of galaxies having overdensities 6 greater than some 
threshold S c , within a radius of 0.7ft. -1 Mpc. The candidates originating in 
different bins are then cross-correlated to filter out significant overlaps and 
produce the final catalog. Ramclla et al. (2001) and Lopes et al. (2004) fol- 
low a different approach, as they do not have color information. Instead, they 
use the object magnitudes to minimize background/foreground contamination 
and enhance the cluster contrast, as follows: 

1. The galaxy catalog is divided into different magnitude bins, starting at the 
bright limit of the sample and shifting to progressively fainter bins. The 
step size adopted is derived from the photometric errors of the catalog. 
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2. The VT code is run using the galaxy catalog for each bin, resulting in a 
catalog of cluster candidates associated with each magnitude slice. 

3. The centroid of a cluster candidate detected in different bins will change 
due to the statistical noise of the foreground/background galaxy distri- 
bution. Thus, the cluster catalogs from all bins are cross-matched, and 
overdensities are merged according to a set criterion, producing a com- 
bined catalog. 

4. A minimum number (N m j„) of detections in different bins is required in 
order to consider a given fluctuation as a cluster candidate. N m i„ acts as 
a final threshold for the whole procedure. After this step, the final cluster 
catalog is complete. 

Kim et al. (2002) and Lopes et al. (2004) compare the performance of 
their VT algorithms with the HMF and adaptive kernel, respectively. Figure 
6 (taken from Kim et al. 2002) shows the absolute recovery rates of clusters 
in four different ranges of cluster parameters for the HMF (solid line) and the 
VT (dashed line). Both algorithms agree very well for clusters with the highest 
signals (rich, low redshift), but the VT does slightly better for the thresholds 
determined from the uniform background case. Similarly, Lopes et al. (2004) 
find that the VT algorithm performs better for poor, nearby clusters, while 
the adaptive kernel goes deeper when detecting rich systems, as seen in Figure 
7, where the VT-only detections are preferentially poor and low redshift, and 
the AK-only detections are richer and at high redshift. 

5.10 maxBCG 

The maxBCG algorithm, developed for use on SDSS data (Annis et al. 2002, 
Hansen et al. 2005), is another technique that relies on the small color dis- 
persion of early-type cluster galaxies. The brightest of the cluster galaxies 
(BCGs) have predictable colors and magnitudes out to redshifts of order unity. 
Unlike many of the other techniques discussed above, maxBCG does not gen- 
erate density maps. Instead, it calculates a likelihood as a function of redshift 
for each galaxy that it is a BCG, based on its colors and the presence of a red 
sequence from the surrounding objects. This is calculated as 

Cmax = max£(z); C(z) = Cbcg + logN gat (5) 

where Cbcg is the likelihood, at redshift z, that a galaxy is a BCG, based 
on its colors and luminosity, and N ga i is the number of galaxies within 1 hr 1 
Mpc with colors and magnitudes consistent with the red sequence (i.e. within 
0.1 mag of the mean BCG color at the redshift being tested). This procedure 
results in a maximum likelihood and redshift for each galaxy in the catalog. 
The peaks in the C max distribution are then selected as the candidate clusters. 

This algorithm appears to be extremely powerful for selecting clusters in 
the SDSS. Simulations suggest that maxBCG recovers and correctly estimates 
the richness for greater than 90% of clusters and groups present with N ga i > 15 
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Fig. 8. SDSS color-magnitude diagram of observed g — r vs. apparent i band for 
galaxies near a rich cluster at z — 0.15. Ellipses represent 1, 2, and 3 a contours 
around the mean BCG color and magnitude at that redshift. The dotted line indi- 
cates the track of BCG color and magnitude as a function of redshift. The horizontal 
lines and vertical dashed line show the region of inclusion for N ga i determination. 
Taken from Hansen et al. (2005). 



out to z = 0.3, with an estimated redshift dispersion of Sz = 0.02. As long as 
one can obtain a sufficiently deep photometric catalog, with the appropriate 
colors to map the red sequence, this technique can be used to very efficiently 
detect clusters. Like all methods that rely on the presence of a red sequence, 
it will eventually fail at sufficiently high redshifts, where the cluster galaxy 
population becomes more heterogeneous. However, clusters detected out to 
z ~ 1 — 1.5, even using non-optical techniques, still show a red sequence, 
albeit with larger scatter, which will reduce the efficiency of this method. 
Additionally, the definition of N ga i s as the number of red sequence galaxies 
may introduce a bias, as poorer, less concentrated, or more distant clusters 
have less well defined color-magnitude relations, and the luminosity functions 
for clusters vary with richness as well (Figure 10 of Hansen et al. 2005). 

5.11 The Cluster Red Sequence Method 

As we have discussed already, the existence of a tight color-magnitude relation 
for cluster galaxies provides a mechanism for reducing fore- and background 
contamination, enhancing cluster contrast, and estimating redshifts in cluster 
surveys. Because the red sequence is such a strong indicator of a cluster's pres- 
ence, and is especially tight for the brighter cluster members, it can be used to 
detect clusters to high redshifts (z ~ 1) with comparatively shallow imaging, 
if an optimal set of photometric bands is chosen. This is the idea behind the 
Cluster Red Sequence (CRS; Gladders & Yee 2000) method, utilized by the 
Red Sequence Cluster Survey (RCS; Gladders et al. 2005). Figure 9a shows 
model color-magnitude tracks for different galaxy types for 0.1 < z < 1.0. 
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The cluster ellipticals are the reddest objects at all redshifts. Even more im- 
portantly, if the filters used straddle the 4000A break at a given redshift, the 
cluster ellipticals at that redshift are redder than all galaxies at all lower red- 
shifts. The only contaminants arc more distant, bluer galaxies, eliminating 
most of the foreground contamination found in imaging surveys. The change 
of the red sequence color with redshift at a fixed apparent magnitude also 
makes it a very useful redshift estimator (Lopez-Cruz 2004). 




Fig. 9. Left: Simulated (V — I c )ab vs. (I c )ab color-magnitude diagram. Model 
apparent magnitudes and colors at various redshifts for several types of galaxies 
at a fixed Mi of -22 are shown. The dotted lines connect galaxies at the same 
redshift. Solid near- horizontal lines show the expected slope of the red sequence at 
each redshift. Right: CMD of a CNOC2 Redshift Survey Patch, with dashed lines 
showing various color CRS slices. The galaxy symbols are sized by the probability 
that they belong to the color slice defined by the solid lines. Taken from Gladders 
& Yee (2000). 

Gladders & Yee generate a set of overlapping color slices based on models 
of the red sequence. A subset of galaxies is selected that belong to each slice, 
based on their magnitudes, colors, color errors, and the models. A weight 
for each chosen galaxy is computed, based on the galaxy magnitude and the 
likelihood that the galaxy belongs to the color slice in question (Figure 9b) . A 
surface density map is then constructed for each slice using a fixed smoothing 
kernel, with a scale radius of 0.33 h~ x Mpc. All the slices taken together form 
a volume density in position and redshift. Peaks are then selected from this 
volume. Gladders et al. 2005 present the results of this technique applied to 
the first two RCS patches. 

In a similar vein, the High Redshift Large Scale Structure Survey (Lubin 
et al. 2006, Gal et al. 2004b, 2005) uses deep multicolor photometry around 
known clusters at z > 0.7 to search for additional large scale structure. They 
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Fig. 10. r — i vs. i color-magnitude and r — i vs. i — z color-color diagrams for 
objects in the C11604 field. 

apply color and color-color cuts to select galaxies with the colors of spcctro- 
scopically confirmed members in the original clusters. The selected galaxies 
are used to make adaptive kernel density maps from which peaks are selected. 
This technique was applied to the C11604 supercluster at z ~ 0.9. Starting 
with two known clusters with approximately 20 spectroscopic members, there 
are now a dozen structures with 360 confirmed members known in this su- 
percluster. These galaxies typically follow the red sequence, but as can be 
seen in Figure 10, the scatter is very large, and many cluster or supercluster 
members are actually bluer than the red sequence at this redshift. Figure 10 
shows the r — i vs. i color-magnitude and r — i vs. i — z color-color dia- 
grams for objects in a ~ 30' square region around the C11604 supercluster, 
with all known cluster members shown in red. and the color selection boxes 
marked. Figure 11 shows the density map for this region, with two different 
significance thresholds, and the clusters comprising the supercluster marked. 
Clearly, in regions such as this, traditional cluster detection techniques will 
yield incorrect results, combining multiple clusters, and measuring incorrect 
redshifts and richnesses. Figure 12 shows a 3-d map of the spectroscopically 
confirmed supercluster members, revealing the complex nature of this struc- 
ture. Dots are scaled with galaxy luminosity. While only ~ 10 Mpc across 
on the sky, the apparent depth of this structure is nearly 10 times greater, 
making it comparable to the largest local superclusters. 

6 Conclusions 

It is clear that there exist many methods for detecting clusters in optical 
imaging surveys. Some of these are designed to work on very simple, single- 
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Fig. 11. Density maps of galaxies meeting the z ~ 0.9 red galaxy criteria in the 
C11604 field . 



Fig. 12. Three dimensional spatial distribution of the spectroscopically confirmed 
C11604 supercluster members. Dots are scaled by galaxy luminosity. 

band data (AK, Matched Filter, VT), but will work on multicolor data as 
well. Others, such as maxBCG and the CRS method, rely on galaxy colors 
and the red sequence to potentially improve cluster detection and reduce 
contamination by projections and spurious objects. Very little work has been 
done to compare these techniques, with the exceptions of Kim et al. 2001, 
Bahcall et al. (2003) and Lopes et al. (2004), each of whom compared the 
results of only two or three algorithms. Even from these tests it is clear that 
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no single technique is perfect, although some (notably those that use colors) 
are clearly more robust. Certainly any program to find clusters in imaging 
data must consider the input photometry when deciding which, if any, of 
these methods to use. 

One of the most vexing issues facing cluster surveys is our inability to 
compare directly to large scale cosmological simulations. Most such simu- 
lations are N-body only, but have perfect knowledge of object masses and 
positions. Thus, it is possible to construct algorithms to detect overdensi- 
ties based purely on mass, but it is not possible to obtain the photometric 
properties of these objects! Recent work, such as the Millennium Simulation 
(Springcl et al. 2005), is approaching this goal. It is necessary to extract from 
these simulations the magnitudes of galaxies in filters used for actual surveys, 
and run the various cluster detection algorithms on these simulated galaxy 
catalogs. The results can then be compared to that of pure mass selection, 
and the redshift-, structure- and mass-dependent biases understood. Ideally, 
this should be done for many large simulations using different cosmologies, 
since the galaxy evolution and selection effects will vary. Such work is funda- 
mental if we are to use the evolution of the mass function of galaxy clusters 
for cosmology. As deeper and larger optical surveys, such as LSST, and other 
techniques such as X-ray and Sunyaev-Z'eldovich effect observations become 
available, the need for these simulations becomes ever greater. 
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